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Future nano-scale electronics built up from an Avogadro number of components needs efficient, 
highly scalable, and robust means of communication in order to be competitive with traditional 
silicon approaches. In recent years, the Networks-on-Chip (NoC) paradigm emerged as a promising 
solution to interconnect challenges in silicon-based electronics. Current NoC architectures are either 
highly regular or fully customized, both of which represent implausible assumptions for emerging 
bottom-up self-assembled molecular electronics that are generally assumed to have a high degree 
of irregularity and imperfection. Here, we pragmatically and experimentally investigate important 
design trade-offs and properties of an irregular, abstract, yet physically plausible 3D small-world 
interconnect fabric that is inspired by modern network-on-chip paradigms. We vary the framework's 
key parameters, such as the connectivity, the number of switch nodes, the distribution of long- versus 
short-range connections, and measure the network's relevant communication characteristics. We 
further explore the robustness against link failures and the ability and efficiency to solve a simple toy 
problem, the synchronization task. The results confirm that (1) computation in irregular assemblies 
is a promising and disruptive computing paradigm for self-assembled nano-scale electronics and (2) 
that 3D small-world interconnect fabrics with a power-law decaying distribution of shortcut lengths 
are physically plausible and have major advantages over local 2D and 3D regular topologies. 



I. INTRODUCTION 

It is generally expected that without disruptive new 
technologies, the ever-increasing computing performance 
(commonly known as Moore's law [12]) and the storage 
capacity achieved with existing technologies will even- 
tually reach a plateau [31] ■ However, there is a lack 
of consensus on what type of technology and comput- 
ing architecture holds most promises to keep up the cur- 
rent pace of progress. Among the most contemplated fu- 
ture and emerging technologies are quantum computers, 
molecular electronics, nano-electronics, optical comput- 
ers, and quantum-dot cellular automata (QCA). In this 
paper, we will primarily focus on self-assembled nano- 
scale electronics based on nanowires or nanotubes be- 
cause these fabrication technologies have become quite 
mature on the physical and device level. It is, however, 
still unclear how to develop higher-level computational 
architectures in a reliable way, although a number of 
promising approaches have been explored in detail (e.g., 

OH HUES]) As Chen et aL ® state ' "^ n order to re ~ 

alize functional nano-clcctronic circuits, researchers need 
to solve three problems: invent a nanoscale device that 
switches an electric current on and off; build a nanoscale 
circuit that controllably links very large numbers of these 
devices with each other and with external systems in or- 
der to perform memory and/or logic functions; and de- 
sign an architecture that allows the circuits to commu- 
nicate with other systems and operate independently on 
their lower-level details." 

Building a scalable computing architecture on top of 
a potentially very unreliable physical substrate, such as 
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for example molecular electronics, is a challenging task, 
which is guided by a number of major trade-offs in the 
design space |53j , such as the number and the characteris- 
tics of the resources available, the required performance, 
the energy consumption, and the reliability. The lack of 
systematic understanding of these issues and of clear de- 
sign methodologies makes the process still more of an art 
than of a scientific endeavor and the appearance of novel 
and non-standard physical computing devices (e.g., [56] ) 
generally only aggravates these difficulties. 

In recent years, the importance of interconnects on 
electronic chips has outrun the importance of transistors 
as a dominant factor of performance [T^l HH SO]- The 
reasons are twofold: (1) the transistor switching speed 
for traditional silicon is much faster than the average 
wire delays and (2) the required chip area for intercon- 
nects has dramatically increased. The ITRS roadmap 
[1J lists a number of critical challenges for interconnects 
and states that "[i]t is now widely conceded that tech- 
nology alone cannot solve the on-chip global interconnect 
problem with current design methodologies." The major 
challenges are related to delays of non-scalable global in- 
terconnects and reliability in general, which leads to the 
observation that simple scaling will no longer satisfy per- 
formance requirements as feature sizes continue to shrink 

In this paper, we experimentally and pragmatically in- 
vestigate a certain class of irregular and physically plau- 
sible 3D interconnect fabrics, which are likely to be eas- 
ily and cheaply implementable by self- assembling either 
nanowires or nanotubes. We will vary the framework's 
key parameters, such as the connectivity, the number of 
switch nodes, the distribution of long- versus short-range 
connections, and measure the network's relevant commu- 
nication characteristics and the robustness against fail- 
ures. Further, we will compare its performance with reg- 
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ular and nearest-neighbor connected 2D and 3D cellular- 
automata- like interconnect fabrics. We will also evaluate 
and compare the performance of a simple task that is 
frequently used in the cellular automata community, the 
synchronization task. 

The motivation for investigating alternative and more 
biologically-inspired interconnects, that can be self- 
assembled easily and cheaply, can be summarized by the 
following observations: 

• long-range and global connections are costly (in 
terms of wire delay and of the chip area used) and 
limit system performance [24] ; 

• it is unclear whether a precisely regular and ho- 
mogeneous arrangement of components is needed 
and possible on a multi-billion-component or even 
Avogadro-scale assembly of nano-scale components 

m 

• "[s]elf-assembly makes it relatively easy to form 
a random array of wires with randomly attached 
switches" [55]; and 

• building a perfect system is very hard and expen- 
sive 

By using an abstract, yet physically plausible and 
fabrication-friendly nanoscale computing framework, we 
show that self-assembled interconnect fabrics with small- 
world |58j properties have major advantages in terms 
of performance and robustness over purely regular and 
nearest-neighbor connected fabrics, such as cellular- 
automata- like topologies (sometimes called NEWS com- 
munication, standing for north, east, west, south). While 
there is ample evidence of the superior communication 
characteristics of small- world and power-law over locally 
connected topologies (see also Section [II]) , most abstract 
models are not physically plausible and are thus of lim- 
ited significance for real- world implementations. For ex- 
ample, it is very unrealistic to assume a uniform rewiring 
probability (as in the original Watts-Strogatz model [55] ) 
over all possible nodes. Spatial aspects of small- world 
topologies and wiring-cost perspectives have only re- 
cently gained more attention [331 SHI US]- We 
call such interconnect topologies nature- or bio-inspired 
because they are physically plausible and similar network 
topologies are widespread in natural systems. 

The goal of this paper is to experimentally investi- 
gate important design trade-offs of self-assembled inter- 
connect fabrics for emerging nano-scale electronics. The 
main contribution consists in a pragmatic comparison of 
regular (both 2D and 3D) versus irregular small-world 
topologies of physically plausible self-assembled network- 
on-chip (NoC) interconnect fabrics that are inspired by 
natural networks. We believe that the results will help 
to make important design decisions for future bottom-up 
self-assembled computing architectures. The question of 
how much interconnect we need and how one can effi- 
ciently build — or rather self- assemble — it, is a very im- 
portant question, not only for chip design and future 



molecular electronics in particular, but also for any mas- 
sively parallel computing architecture in general. 

The remainder of the paper is as following: Section [II 
provides a brief introduction to complex networks and 
modern network-on-chip (NoC) paradigms. Section III 
describes the framework that we use, such as the topolo- 
gies used, the wire and node model, and how physically 
plausible the approach is. Section [TV] reports on five sim- 
ple experiments which illustrate the main findings, while 
Section [V] finally concludes the paper. 



II. NETWORKS AND NETWORKS-ON-CHIP 
A. Complex Networks and Wiring Costs 

Most real networks, such as brain networks []1|J 152"] . 
electronic circuits [21], the Internet, and social networks 
share the so-called small-world (SW) property [58]. Com- 
pared to purely locally and regularly interconnected net- 
works (such as for example the cellular automata inter- 
connect), small- world networks have a very short average 
distance between any pair of nodes, which makes them 
particularly interesting for efficient communication. 

The classical Watts-Strogatz small-world network [55] 
is built from a regular lattice with only nearest neighbor 
connections. Every link is then rewired with a rewiring 
probability p to a randomly chosen node. Thus, by vary- 
ing p, one can obtain a fully regular (jp = 0) and a fully 
random (p = 1) network topology. The rewiring proce- 
dure establishes "shortcuts" in the network, which sig- 
nificantly lower the average distance (i.e., the number 
of edges to traverse) between any pair of nodes. In the 
original model, the length distribution of the shortcuts is 
uniform since a node is chosen randomly. If the rewiring 
of the connections is done proportional to a power law, 
l~ a , where I is the wire length, then we obtain a small- 
world power-law network. The exponent a affects the net- 
work's communication characteristics 33 and navigabil- 
ity [32] . which is better than in the uniformly generated 
small-world network. One can think of other distance- 
proportional distributions for the rewiring, such as for 
example a Gaussian distribution, which has been found 
between certain layers of the rat's neocortical pyramidal 
neurons |23j . Studying the connection probabilities and 
the average number of connections in biological systems, 
especially in neural systems, can give us important in- 
sights on how nearly optimal systems evolved in Nature 
under limited resources and various other physical con- 
straints. 

In a real network, it is fair to assume that local con- 
nections have a lower cost (in terms of the associated 
wire-delay and the area required) than long-distance con- 
nections. Physically realizing small-world networks with 
uniformly distributed long-distance connections is thus 
not realistic and distance, i.e., the wiring cost, needs to 
be taken into account, a perspective that recently gained 
increasing attention 05JI39]. On the other hand, a net- 
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work's topology also directly affects how efficient prob- 
lems can be solved. For example, it has been shown 
that both small-world [53] as well as random Erdos-Rcnyi 
topologies 41] offer better performance than regular lat- 
tices and are easier to evolve to solve the global synchro- 
nization and density classification task, two toy problems 
commonly used in the cellular automata community. 

In summary: there is trade-off between (1) the phys- 
ical rcalizability and (2) the communication character- 
istics for a network topology. A locally and regularly 
interconnected topology is in general easy to build and 
only involves minimal wire and area cost, but it offers 
poor global communication characteristics and scales-up 
poorly with system size. On the other hand, a random 
Erdos-Renyi topology scales-up well and has a very short- 
average path length, but it is not physically plausible be- 
cause it involves costly long-distance connections estab- 
lished independently of the Euclidean distance between 
the nodes. 



B. Addressing Interconnect Challenges by 
Networks-on-Chip 

The topic of interconnect networks for computers and 
chips is vast and complex. Here, we'll give a brief — 
and certainly incomplete — overview on communications 
on chips. From a bird's eye view, the main challenge of 
interconnect fabrics consists in transferring data between 
two points of the chip with a minimal latency, minimal 
energy consumption, and maximal reliability. This job 
can obviously be done in a wide variety of ways. Com- 
pared to computer-to-computer networks, one has to gen- 
erally deal with a more restrictive set of resources and 
with more constraints to consider. The balance between 
communication and computation is guided by numerous 
design trade-offs and is key to performance. For exam- 
ple, a set of fast processors is useless if you cannot get 
enough data to them on time. 

Traditional VLSI (Very Large-Scale Integration) de- 
sign uses an ad-hoc and monolithic communication fabric 
that connects different resources (such as for example the 
memory and the ALU) on the chip together by dedicated 
wires. With increasing system complexity and the con- 
tinuing miniaturization of the technology, radically new 
interconnect approaches will be necessary if we want to 
sustain the current pace of progress |40l . Two main fac- 
tors potentially limit performance [15l [24] : (1) the minia- 
turization of wires, unlike transistors, does not enhance 
their performance, which is why wires are now more im- 
portant than transistors |40| . and (2) global wires that 
communicate signals across the whole chip increase de- 
lays and therefore limit the system scalability. The 2005 
ITRS roadmap [2] (and the 2006 update) lists a more de- 
tailed number of critical challenges for interconnects. In 
recent years, true 3D architectures and associated design 
methodologies have emerged, which offer an attractive 
option to address some of the current interconnect chal- 



lenges HED]. 

On the other hand, networks-on-chip (NoC) [6J [16] 
have been proposed as a promising solution to address the 
on-chip communication challenges and to cope with the 
increasing communication requirements. The basic idea 
behind this paradigm is that the different modules on the 
chip (e.g., IP cores) are interlinked by a bus- like commu- 
nication network with programmable switch blocks that 
support packet-oriented traffic. Thus, NoC architectures 
decouple the communication fabric from the processing 
and storage elements [B] and provide a more modular 
view of the system, which allows to better master com- 
plexity of large-scale systems, to decompose it into inde- 
pendent sub-systems, and to keep things flexible. The 
additional communication fabric obviously results in an 
overhead of area and energy dissipation, which the de- 
signer has to consider in addition to all other design 
trade-offs. The overhead largely depends on the connec- 
tivity, the number of switch blocks, their complexity, and 
the number of possible repeaters. Pande et al. provide 
some estimates for their framework [46]. 

The NoC approach is very general and allows for any 
interconnect architecture between functional and com- 
munication blocks, such as for example local and regular, 
fat-tree, hypercube, or irregular and application specific, 
or small- world topologies as described in Section II A (sec 
[HI [3B] for some examples). Field-Programmable Gate 
Arrays (FPGAs), for example, offer a regular arrange- 
ment of programmable logic blocks that are intercon- 
nected by a programmable communication fabric, which 
introduces a great degree of freedom for the application 
designer. However, once the NoC topology is selected, 
the only remaining degree of freedom is the routing strat- 
egy 

In the following, we will make use of the NoC paradigm 
in combination with nature-inspired 3D network topolo- 
gies for our explorative framework. Very recently, 
other researchers investigated the idea of inserting long 
range connections to otherwise regular networks-on-chip. 
Ogras et al. [33] showed that a significant reduction in 
the average packet latency can be achieved by superpos- 
ing a few long-range links to a standard mesh network. In 
their approach, the links are however not inserted at ran- 
dom but where they are most useful. Oshida and Ihara 
[35] investigate the performance of a scale-free network- 
on-chip topologies. Their findings show that short laten- 
cies and a low packet loss ratio can be achieved. The par- 
allel processing community has also looked into improv- 
ing large-scale multi-computer interconnects by adding 
shortcuts to bypass nodes [TTJ [3S] ■ Fuks and Lawniczak 
|20] and Lawniczak et al. |34] examined more gener- 
ally how the introduction of additional random links in- 
fluences the performance of computer networks. For all 
these approaches, the performance strongly depends on 
the routing algorithm and whether it is able to efficiently 
use the provided "shortcuts" in the networks while avoid- 
ing congestion at the same time. 
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III. DESCRIPTION OF THE FRAMEWORK 

We are interested in experimentally exploring self- 
assembled networks-on-chip architectures that are built 
in a largely random manner. If we want nano-scale elec- 
tronics to become a success, we have to show that we 
can (1) build systems that involve an Avogadro number 
of components, and (2) that such a system can efficiently 
and robustly solve a specific task. In the absence of math- 
ematical models for self-assembled electronics (such as for 
example nanowire growth models), we decided to build 
a toy framework that would allow us to experimentally 
explore the properties and design trade-offs we are inter- 
ested in. The framework also allows us to quantitatively 
compare the irregular and self- assembled with represen- 
tative and regular nearest-neighbor fabrics. 

In the following sections, the network-on-chip-like 
framework and the evaluation methodology, which is in- 
spired by Pande et al. [15] . shall be described in more 
details. 




3D random multitude 



3D cellular automata 



2D cellular automata 
PNI 



FIG. 1: Top left: a 3D random multitude (RM) example 
composed of processing nodes (PNs), switch nodes (SNs), and 
interconnections. Top right: a 3D CA grid-like architecture. 
Bottom: a 2D CA grid-like architecture. 



A. Node and Link Model 

The basic system-on-chip-like architecture that we use 
is composed of (1) programmable computing elements, 
called processing nodes (PNs), and (2) of an associated 
switch-based interconnect fabric, which is itself composed 
of (3) switch nodes (SNs) and (4) bi-directional point-to- 
point interconnects among them. Both processing and 
switch nodes can be considered as simple modules of a 
large-scale system that need to communicate efficiently 
among each other. Figure [l] shows there different ar- 
rangements. The interconnect topologies shall be de- 
scribed in the next section. 

Each switch node can only transmit in parallel mes- 
sages on C different virtual channels to its neighbors 
(see e.g. [46] for more details about the concept of vir- 
tual channels) according to a specific routing scheme. 
No further information processing is done in the switch 
nodes. We assume that they can temporarily store a 
limited number of M messages. For the sake of simplic- 
ity, we have chosen this number to be large enough, i.e., 
M = 100, to handle our simulations without creating 
jamming and losing messages. The processing nodes, on 
the other hand, simply send and receive messages accord- 
ing to a specific traffic scheme. Since we are interested 
in interconnect issues here, we do not further specify or 
limit the processing nodes' computing capacity. 



B. Network Topologies 

We have decided to compare the following six reference 
network topologies in order to quantitatively evaluate the 
self-assembled and irregular fabrics: 



• 2DCA: 2D (unfolded) regularly arranged and lo- 
cally interconnected (von Neumann neighborhood) , 
see Figure [T] 

• 3DCA: 3D (unfolded) regularly arranged and lo- 
cally interconnected (6 neighbors per switch node), 
see Figure [T] 

• SDRMStandard: 3D random arrangement, small- 
world, power-law, a = 1.8; 

• SDRMLocal: 3D random arrangement, small- 
world, power-law, a = 3 (locally interconnected); 

• SDRMGlobal: 3D random arrangement, small- 
world, power-law, a = (globally interconnected); 
and 

• SDRMRealistic: 3D random arrangement, small- 
world, power-law, a = 1.8, upper limit k max on the 
number of connections per node, independently of 
the average connectivity. 

We call a 3D random arrangement a random multitude 
(RM). Figure [ljdepicts a 3D random multitude, a 2D CA, 
and a 3D CA topology. We do not use folded versions for 
the cellular-automata-like topologies because that would 
require long-distance connections. For both 2D and 3D 
CAs, the processing nodes are regularly arranged in the 
2D or 3D Euclidean space inside a unitary square, respec- 
tively cube. The number of processing nodes is equal to 
the number of switch modes, and each processing node 
is connected to its associated switch node by a single 
connection of 0.01 unit length. 

For the random multitude, both processing and switch 
nodes are randomly arranged in 3D space, as illustrated 
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in Figure [T] (top left). Both the arrangement and the 
wire topology are inspired by self-assembled nano-scale 
electronics, as we will sec in Section [III E| To make com- 
parisons with the CA grid-like architectures easier, we as- 
sume that each processing node is connected to the near- 
est switch node by a single connection only. The switch 
nodes are connected among themselves by a small-world 
power-law network [301 148) 149] with average connectivity 
< ks >, i.e., each switch node establishes < ks > connec- 
tions on average with its neighbors proportional to l~ a , 
where I is the Euclidean distance between the two switch 
nodes in question. Thus, the bigger a, the more local the 
connectivity. For a — 0, we obtain the original Watts- 
Strogatz small- world topology. The choice of a = 1.8 was 
guided by the experiments and will be further explained 
in Section [IVB| For all our experiments, we also make 
sure that the graphs do not contain disconnected parts. 

Algorithm 1 summarizes the construction of a random 
multitude with average connectivity < ks > from an 
algorithmic point of view. For the restricted version 
(3D >RM Realistic), there is an upper limit k max on the 
maximum number of connections per node that can be 
established. The idea being this realistic restriction is 
to make the topology more physically plausible. More 
details shall be given in Section |IIIE[ 

Algorithm 1 Construction of a 3D random multitude 
(EM) 

1: Randomly position N processing nodes within a 1 x 1 x 1 

unit cube (at distinct locations). 
2: Randomly position S switch nodes within a 1 x 1 x 1 unit 

cube (at distinct locations). 
3: nbLinks = < k$ > xS 
4: for each processing node p = 1 to N do 
5: Connect p to its nearest switch node. 
6: end for 

7: for each link I = 1 to nbLinks do 
8: Randomly choose a switch node s. 

9: Choose a neighboring node d proportional to l~ a , where 

I is the Euclidean distance between s and d 
10: if "restricted" multitude type (SDRMRealistic) then 
11: if kmax reached for node s or d then 
12: Do not establish a link. 

13: else 

14: Establish a bi-directional connection between s 

and d. 
15: end if 
16: else 

17: Establish a bi-directional connection between s and 
d. 

18: end if 
19: end for 



mance. There exists a large number of different routing 
strategies and flavors, which allow to send packets along 
pre-specified or dynamically chosen paths in a given net- 
work from a source to a destination. Whether throughput 
or latency or any other property needs to be maximized, 
highly depends on the application. In general, an ideal 
routing algorithm is adaptive, decentralized, robust, and 
guarantees quality-of-service (QoS) within well defined 
bounds. 

Here, we deal with packet routing only and the switch 
nodes do therefore have to know where to route a packet 
that they receive. The path information can be obtained 
dynamically or statically, based on the available infor- 
mation on the network's topology. Shortest path routing 
is frequently used, but it's not necessary the best rout- 
ing strategy [51] since congestion has to be taken into 
account. 

Efficient search and information transfer on com- 
plex networks while avoiding congestion and optimizing 
throughput and latency at the same time are of great im- 
portance in real- world systems H [12j [13l [35j [50l [64] . It 
has also been shown that small-world and scale-free net- 
works offer great communication characteristics, are effi- 
cient to navigate [351 \57\ , and reduce congestion [55] [59] ■ 
Routing based on local information only (e.g., [12] [57]) is 
of particular interest for large-scale systems where global 
path information is either not available or too costly to 
store in each node. 

Here, we are more interested in exploring the extrema 
than to use any complicated and highly optimized rout- 
ing algorithm. We use two main routing strategies: (1) 
shortest path routing and (2) random wandering. Short- 
est path routing is optimal if the traffic is low and no con- 
gestion occurs, but every node needs to store a routing 
table that can get considerably large for large networks. 
In random wandering, the switch node that receives a 
message simply sends it to a randomly chosen neighbor. 
This is very simple to implement and robust against link 
and node failures, but very inefficient for larger system 
sizes. We have also explored ant routing [8] as an alter- 
native, which essentially allows to build shortest paths in 
a decentralized manner by the messages (i.e., the "ants") 
themselves. 

We decided to adopt a very simple — and admittedly 
not very realistic — traffic model, that is however widely 
used to evaluate networks: uniform random traffic. Ev- 
ery processing node n injects a message to a randomly 
chosen processing node into the network with probability 
pj at each time step. If this injection rate pj is 1, a mes- 
sage will be generated during every time step by every 
node. 



C. Routing and Traffic Models 



D. Performance Metrics 



Once a topology is chosen and fixed, the only free 
parameter available is essentially the routing strategy, 
which plays a crucial role for the overall system perfor- 



To compare the different network-on-chip topologies, 
we use standard performance metrics and an evaluation 
methodology inspired by Pande et al. [35]. While area 
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overhead and especially energy consumption are increas- 
ingly important, we are not focusing on these aspects 
here. We are mainly concerned by (1) the average num- 
ber of hops a message has to take in the shortest path 
from a source to a destination, i.e., the number of switch 
nodes in the path, (2) the latency, and (3) the shortest 
path length (in distance units). Although throughput is 
important, we assume that for our applications, we have 
low traffic that does not lead to congestion. 

The average number of hops is measured over T simula- 
tion time steps and over all messages sent. The average 
shortest path length is measured in distance units and 
takes into account the paths between all possible combi- 
nations of processing nodes. The throughput of a switch 
node is measured in messages per number of updates per 
switch node. 

Unless otherwise stated, our experiments used S = 
N = 64, a synchronous node update, 6 virtual channels 
per node (i.e., a 3D CA-node could send a message into 
all directions simultaneously), an average switch node 
connectivity of < k$ >— 6, an exact processing node 
connectivity of 1, a traffic injection rate of pi = 0.1 (i.e., 
each node sends a message every left 1 time step on aver- 
age), and a maximum connectivity of k max = 10 for the 
realistic multitude SDRMRealistic. In our framework, a 
message is an abstract entity and we do not take into 
account its size in terms of number of bits. 



E. Physical Plausibility and Realizability 

There exists an abundance of abstract computing mod- 
els that are either hard or impossible (e.g., when infinite 
resources or time is involved) to physically realize. Build- 
ing computers is about hijacking the underlying material 
in order to make it do the things we want. Today, the 
vast majority of fabrication processes is top-down ori- 
ented and given a computing architecture, the goal is 
to successfully realize it, for example by using silicon- 
based electronics. However, with the ongoing minia- 
turization and the constant need for more computing 
power, there has been an increasing interest in bottom- 
up assembling techniques and disruptive new computing 
concepts and methodologies. Especially self-assembling 
molecular electronics, based for example on nanowires or 
nanotubes, bears unique challenges and opportunities for 
new paradigms. 

Despite important progress, the fabrication of ordered 
3D hierarchical nano-structures remains very challenging 
[53]. Here, we argue that because of fewer physical con- 
straints, computing architectures that are "assembled" 
in a largely random manner, are easier and cheaper to 
build than highly regular architectures, such as cross- 
bars or cellular-automata- like assemblies, which usually 
require a perfect or almost perfect establishment of the 
connections. Self-assembly, for example, is particularly 
well suited for building random structures |65j . Power- 
law connection-length distributions have been observed 



in many systems created through self-organization, such 
as the human cortex or the Internet, and they can be 
considered "physically realizable" [35] ■ Such topologies 
evolve naturally in Nature, essentially because of the cost 
associated with long distance connections, which pre- 
vents a uniform wiring probability over all nodes. 

There is very little work about computing architectures 
with irregular assemblies of connections and components. 
Tour et al. [S3], for example, explored the possibility 
of computing with randomly assembled, easily realizable 
molecular switches, that are only locally interconnected, 
however. On the other hand, Hogg et al. [55] present 
an approach to build reliable circuits by self-assembly 
with some random variation in the connection location. 
At the exception of a few researchers (e.g., [3BJ H7J [SBJ), 
the vast majority working in the field of nano-scale elec- 
tronics tries to build regular structures, which allow for a 
more or less straightforward mapping of higher- level com- 
puting architectures. Computing with highly irregularly 
assembled physical substrates is undoubtedly a new and 
disruptive paradigm. The main question we would like 
to address to some extent in this section is whether and 
how the framework as described above could be physi- 
cally implemented. 

Designing nanoscale interconnects is guided by a num- 
ber of major and dependent trade-offs: (1) the number 
of long(er)-distance connections, (2) the physical plausi- 
bility, and (3) the efficiency of communication. Since the 
fabrication of nano-scale computing architectures tends 
to be very challenging, we opt for a fabrication-friendly 
approach and will try to live with what we can currently 
build. Although we are unable to provide experimental 
results at this point, plausible approaches for physical 
realizations shall be briefly sketched here. Preliminary 
experiments with both nanowires and nanotubes are cur- 
rently underway at the Los Alamos National Laboratory. 

We believe that a random multitude would be best re- 
alized in a hybrid way today, where the processing and 
switch nodes are for example made of current (nanoscale) 
silicon. Since we are focusing on the interconnections 
here, we do not further specify the characteristics of these 
nodes. Our only intention is to keep both the computa- 
tion and the routing as simple as possible to minimize the 
node's complexity. The interconnect fabric would then be 
gradually self-assembled using either nanowires or nan- 
otubes [25]. We imagine that in a first step, both the pro- 
cessing and the switch nodes would cither be randomly 
placed in a scaffolding structure or be submerged in some 
kind of fluid, similar to the self-assembly of nanowires 
from a solution |10j . Each processing and switch node 
will have a limited number of seed points, where either 
nanowires or nanotubes could connect to or grow out by, 
for example, a self-catalytic growth process. The wires 
would grow in random directions and eventually make 
contact with another switch or processing node. The 
growth could be guided by additional scaffolding struc- 
tures or for example by magnetical and electrical fields. 
Ye et al. |62j . for example, present an approach for the 
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directed assembly by means of electrodeposition or vapor 
deposition. As an alternative to directly growing wires 
from seed points on the nodes, one could also imagine to 
pre-fabricate the wires and then to connect them in a sec- 
ond step to the nodes, for example while being immersed 
in a solution. 

To obtain a specific power-law distribution of connec- 
tion lengths, and thus to obtain a small-world topology 
as described above, one could imagine to grow different 
wires with different probabilities, whose lengths follow 
a power-law distribution. Alternatively, if the wires di- 
rectly grow out of the nodes and randomly connect to 
neighboring nodes, we hypothesize that it is possible to 
obtain a desired distribution as a function of the Eu- 
clidean distance between the nodes by imposing restric- 
tions on the wire growth lengths. However, physical wire 
growth models or experimental results would be neces- 
sary to further investigate this option. Note also that 
current nanowires tend to be rather short because of a 
high resistance and the probability of breaks, which will 
naturally limit the number of long-distance connections. 

In order to make our framework as realistic as possible, 
we introduced in Section |IIIB| a limitation k max on the 
number of links that a node can carry for the SDRMRe- 
alistic random multitude. Given the above growth mech- 
anisms, this is a realistic assumption because one cannot 
assume an unlimited number of contacts on a given node. 
The exact value of k max will depend on the wire-type and 
the fabrication technology. For all our experiments, we 
use a value of k max = 10, which seems rather pessimistic, 
but allowed to make a plausible comparison with the un- 
restricted random multitude, 3DRMStandard. 

In summary, we believe that there exist several promis- 
ing paths to physically realize irregularly self-assembled 
networks of wires and nodes with a specific topology. 
The random multitude construction algorithm (see Sec- 
tion IIIIBI of our framework is meant to reflect what we 
could possible assemble in reality in the very near future. 



IV. EXPERIMENTS 

In the following sections, we'll perform a number of 
pragmatic and simple experiments with the goal to com- 
pare the performance of realistic, both regular and irreg- 
ular network-on-chip topologies as presented in Section 
IIII Bl All simulations were written in Matlab. 



A. Experiment 1: System Scalability 

The goal of this first experiment is to illustrate how 
the different topologies perform as the system size scales 
up. What works for N — 64 nodes does not necessary 
work for N = 10000 nodes. As we have seen before, scal- 
ability is a critical issue for nano-scale systems because it 
is generally very easy to build systems that involve huge 
numbers of components, e.g., Avogadro-scale systems. If 
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FIG. 2: Scaling of the number of hops as a function of the 
system size N = S. The a = (global), a = 1.8 (standard), 
and q = 1.8 (realistic) random multitudes show a logarithmic 
scaling behavior. Average over 10 runs for RMs and over 7 
runs for CAs. 



the communication fabric doesn't allow to efficiently send 
data across such an assembly, it will be impossible to 
solve tasks efficiently and thus to stay competitive with 
conventional design approaches. 

For all six assemblies as described in Section |III B| 
we have varied the system size and measured the av- 
erage number of hops, which is proportional to the 
average path length L, i.e., the number of edges in 
the shortest path between two nodes, averaged over 
all pairs of nodes, as for example used in [58 . The 
different system sizes we used were: (1) N = S — 
[9, 14, 19, 24, 29, 34, 39, 44, 49, 54, 59, 64] for random mul- 
titudes, (2) N = S = [9, 16, 25, 36, 49, 64, 81, 100, 121] for 
2D CAs, and (3) N — S — [8, 27, 64, 125] for 3D CAs. 
Shortest path routing was used. 

As Figure [2] shows, the locally connected topologies, 
i.e., the 2D and 3D CA as well as the local 3D ran- 
dom multitude, scale up worse with system size than 
the other three topologies. Not surprisingly, the globally 
connected random multitude (a = 0) scales up best be- 
cause of the uniform rewiring probability over all nodes, 
independently of the Euclidean distance between them. 
The average path lengths of small-world graphs scale up 
logarithmically with the number of nodes, which Figure 
[2] confirms. Note that there is only a very small dif- 
ference between the realistic random multitude and the 
unrestricted a — 1.8 multitude. This is good news and 
illustrates that the limited and thus more realistic con- 
nectivity has little effect on the average number of hops 
as the system is scaled up. 



8 



B. Experiment 2: Local versus Global Connections 

The distribution of the long- and short-distance con- 
nections as a function of the Euclidean distance between 
the nodes is a crucial parameter in our model. Clearly, 
we are interested in a great network performance while 
having a minimal number of global connections, which 
are generally costly to establish. In this experiment, we 
explore the network's communication characteristic as a 
function of the parameter a and compare it with the 
fixed 2D and 3D CA grid- like assemblies. The function 
l~ a , where I is the Euclidean distance between the two 
switch nodes in question, describes the connectivity of 
the random multitudes. 

Figure [3] shows the average number of hops as a func- 
tion of the power-law exponent a. The 2D and 3D CA as 
well as the locally connected and fixed random multitude 
(a = 3) are plotted as a reference, although their value is 
independent of a. As one can see, the number of hops in- 
creases dramatically the bigger a gets, i.e., the more local 
the connectivity becomes. For comparison, the ID ring 
structure as used in the original Watts-Strogatz rewiring 
model [58 is also shown. The ID ring structure performs 
worse with increasing a than the 3D random multitudes, 
which offer a higher connectivity. For a value of a that 
is slightly smaller than 2, both the unrestricted and the 
realistic random multitude perform better than the 3D 
regular assembly. 

In our framework, the average latency is essentially 
proportional to the average number of hops because we 
keep the traffic injection rate low to avoid jamming. 
Figure [3] also confirms the small-world characteristic of 
the wiring, where the average path length — the average 
number of hops in our case — drastically drops when a 
few global connections are added (i.e., when a becomes 
smaller). Figure [4] shows the clustering coefficient C [55] 
of the ID ring and the two 3D random multitudes. The 
random multitudes have very low clustering coefficient, 
while the ID ring behaves like the Watts-Strogatz model. 

As Petermann and De Los Rios [15] find both ana- 
lytically and numerically, the small-world phenomena in 
a network built using a power-law decaying distribution 
of shortcut lengths occurs when a < D + 1, where D 
is the network's dimension. In the case of our random 
multitudes, D = 3, which confirms our observations of 
small-world behavior. Based on the results as shown in 
Figure [3] we have chosen a = 1.8 for the regular random 
multitudes (3DRMStandard and SDRMRealistic) for the 
following experiments. 



C. Experiment 3: Number of Switch Nodes and 
Connectivity 

From a design perspective, one is obviously interested 
to minimize the number of switch nodes and the con- 
nectivity among the switch nodes to a level that doesn't 
significantly affect the system performance. In this ex- 
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FIG. 3: The average number of hops as a function of the 
power-law exponent a. Average over 2 runs, shortest path 
routing, N = S = 64. 
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FIG. 4: The clustering coefficient as function of the power- 
law exponent a. Average over 2 runs, shortest path routing, 
N = S = 64. 



pcrimcnt, we explore the framework's characteristics by 
varying the number of switch nodes S and the switch 
node connectivity < ks >, while keeping a constant. 

Figure [5] shows that the average number of hops for 
a message to take on the shortest path from any source 
to any destination increases with the number of switch 
nodes S. The fixed CA topologies are shown for compar- 
ison. However, there is a trade-off between the number 
of hops and the throughput a network can handle. A low 
number of switch nodes limits the throughput, which we 
cannot illustrate here because we have chosen a low traf- 
fic injection rate that avoids jamming. Thus, depending 
on what the network needs to be optimized for (i.e., lower 
number of hops, throughput, short average path length, 
etc.), one can make the appropriate choice for the num- 
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FIG. 5: The average number of hops as a function of the 
number of switch nodes S. Average over 2 runs, shortest 
path routing, N = 64. 



FIG. 6: The average path length as a function of the aver- 
age switch node connectivity < ks >■ Average over 2 runs, 
shortest path routing, N — S = 64. 



ber of switch nodes. Obviously, the amount of hardware 
resources and the volume required will also come into 
play in reality. 

Figure [6] shows that the higher the switch node con- 
nectivity < ks >, the lower the average shortest path 
length. The fixed CA topologies are shown for compari- 
son. Once again, one can observe only a small difference 
between the unrestricted and the realistic random multi- 
tude. Further results shall be summarized here: 

• A higher switch node connectivity decreases both 
the average latency and the average number of 
hops. The throughput is only slightly improved. 

• The higher the number of switch nodes S, the 
higher the number of hops and the higher the av- 
erage latency. The lower S, the higher the average 
path length and the higher the throughput. 

• The higher the number of virtual channels C, the 
higher the node throughput (within the limits of 
the capacity of the physical links) and the lower the 
average latency. The average shortest path length 
is not affected by C. 

We can conclude that there are no "optimal" values for 
connectivity, the number of switch nodes, and the num- 
ber of virtual channels. Instead, choosing the right values 
is a matter of dependent trade-offs in the design space. 
Local connections are very interesting from an implemen- 
tational point of view, but offer reduced global communi- 
cation characteristics only, which directly affects the effi- 
ciency of problem solving (see also Section IV El. Adding 
a few long(er)-distance connections proportional to the 
distance between the nodes is physically plausible and 
greatly improves the overall system performance as well 
as the robustness, as we will see in the next section. 



D. Experiment 4: Robustness against Link Failures 

Robustness against manufacturing defects and dy- 
namic failures is critical for future Avogadro-scale self- 
assembled nano-scale architectures [18] . Due to the small 
feature sizes, such systems are generally expected to be 
much more prone to radiation-induced soft errors, to 
thermal noise [3T], and to fabrication defects because of 
the self-assembly processes. 

In order to compare the robustness against link fail- 
ures of both regularly and irregularly interconnected 
topologies, we randomly removed links between the 
switch nodes in our six reference assemblies. It is well 
known that small-world and scale-free networks are ro- 
bust against random failures of nodes and links [H HH] . 
As Figure [7] illustrates, this is also valid for our frame- 
work. The average number of hops is plotted as a func- 
tion of the number of removed switch links. In this ex- 
periment, we used random wandering to illustrate an 
extreme case. As one can see, both 2D and 3D CA- 
topologies start to perform worse, i.e., the average num- 
ber of hops increases, when the number of randomly re- 
moved links approaches 40, while the random multitudes 
essentially remain unaffected by the removed links. 

The random link failures admittedly represent a well 
oversimplified fault model, nevertheless, it illustrates 
that the irregular small-world topologies of our frame- 
work provide "robustness for free" to some extent, even 
without using any specific fault detection and isolation 
technique. 



E. Experiment 5: Solving a Simple Task 

In this last experiment, we are interested in evaluat- 
ing the performance of solving a simple problem, which 
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FIG. 7: The average number of hops as a function of the num- 
ber of randomly deleted links. Average over 2 runs, random 
wandering, N — S = 64. 



is well-known in the area of cellular automata (CA): the 
synchronization task. This "global" task is essentially 
trivial to solve if one has a global view on the entire sys- 
tem (i.e., if one has access to the state of all nodes at the 
same instant in time), but it is non-trivial to solve for 
locally connected cellular automata or random boolean 
networks (RBNs) . Although it is commonly called a "toy 
problem," the synchronization task has actually various 
real-world applications, such as for example in sensor net- 
works, where one cannot assume global synchronization 
and global signals, and thus special mechanisms are re- 
quired [38] . 

The synchronization task (also called firefly task) for 
synchronous CAs was introduced by Das et al. [3] and 
studied among others by Hordijk [27] and Sipper [ST]. In 
this task, the two-state D-dimensional automaton, given 
any initial configuration, must reach a final configuration 
within M time steps, that oscillates between all Os and 
all Is on successive time steps. The whole automaton is 
then globally synchronized. 

Here, we use a slightly adapted version the task for 
our framework: we assume that each processing node in 
our framework contains an oscillator which frequency is 
specified by a number between < f osc < 1. The mod- 
ified task then consists to find a common frequency for 
all oscillators. The algorithm is inspired by the averag- 
ing algorithm as described in [3B] . Each processing node 
state is initialize to a random value from the interval 
[0,1] before it repeatedly performs the following steps in 
an asynchronous manner: (1) send current oscillator fre- 
quency to a random processing node; (2) if the current 
node i receives a message from any other processing node 
r, then average own oscillator with neighbor frequency 
f r ; (3) set own oscillator to this frequency / 4 = ^ g ; 
and (4) also send it to a new random processing node. 

There are obviously numerous (also more efficient) 
ways to solve this task, but here we are interested in an 



FIG. 8: Performance of the synchronization task. The smaller 
the standard deviation of the node state values, the better 
the nodes are synchronized. The initial values for each curve 
depend on the randomly initialized network state. N — S = 
64, random wandering. 



illustrative comparison rather than in the absolute per- 
formance values and limits. We compared how this sim- 
ple algorithm performed on the investigated interconnect 
fabrics by using random wandering. As Figure [8] illus- 
trates, the small- world topologies perform best. Both the 
2D and the locally interconnected multitude very slowly 
converge because of the poor global communication char- 
acteristics. Not surprisingly, the globally connected ran- 
dom multitude performs best. 

It has been shown elsewhere that irregular small- world 
interconnects perform better on both the synchroniza- 
tion (e.g., [2TJ HOE] mgi3], and many others) and the 
density classification task (e.g., [UJ) than purely locally 
interconnected topologies. However, the frameworks and 
assumptions used in each approach are somehow differ- 
ent and sometimes not straightforward to compare. The 
results of our framework merely confirm what has been 
found theoretically elsewhere and in our two previous 
experiments, namely that the excellent communication 
characteristics (i.e., short characteristic path length, the 
small latency, etc.) also helps to efficiently solve tasks, 
especially tasks which require a lot of global communica- 
tion. From an evolutionary perspective, this also seems 
the reason why most natural networks, e.g. the brain 
[19, 23 , 52;, have evolved with the small-world and scale- 
free property. The relationship between efficient problem 
solving and interconnection topologies has naturally also 
preoccupied the parallel computing community since its 
beginning (e.g., [37]). 



V. CONCLUSION 

We have experimentally investigated in a pragmatic 
way several relevant metrics of both regular and irregu- 
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lar, realistic system-on-chip- like computing architectures 
for self-assembled nanoscale electronics, namely 2D and 
3D local-neighborhood as well as irregularly build small- 
world interconnects with different distributions of long- 
distance connections. The small-world architectures with 
a power-law decaying distribution of shortcut lengths in- 
vestigated in our framework are both physically plau- 
sible, could likely be built very economically by self- 
assembling processes, possess great communication char- 
acteristics, and are robust against link failures. While 
regular and local-neighborhood interconnects are easier 
and more economical to build than interconnects with 
lots of global or semi-global long-distance connections, we 
have seen in the previous section that they are not as effi- 
cient for global communication, which is very important 
and directly affects how efficient problems can be solved 
with such architectures. Small- world networks with a 
uniform distribution of long-distance connections or pure 
Erdos-Renyi random networks, on the other hand, are 
not physically plausible because one has to assume an 
increasing wiring cost with distance. As our results have 
shown by means of a simplistic, yet realistic framework, 
small-world topologies with a power-law decaying distri- 
bution of shortcut lengths offer a unique balance between 
performance, robustness, physical plausibility, and fabri- 
cation friendliness. In addition, it has been shown that 



adaptive routing — which we have not explored in detail 
here — is very efficient on small-world power-law graphs 

133 EH . 

We believe that computation in random self-assemblies 
of components and interconnections (see e.g., [25 l l36| , 147 } 
56 ) is a highly appealing paradigm, both from the per- 
spective of fabrication as well as performance and robust- 
ness. This is obviously a radically new technological and 
conceptual approach with many open questions. For ex- 
ample, there are essentially no methodologies and tools 
that would allow (1) to map an arbitrary computing ar- 
chitecture or a logical system on a randomly assembled 
physical substrate, (2) to do arbitrary computations with 
such an assembly, and (3) to systematically analyze per- 
formance and robustness within a rigorous mathematical 
framework. There are also many open questions regard- 
ing the self- assembling fabrication techniques, which will 
need to be further explored in the future. 
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